data science
- 2024-10-25: The Human Dimension to Clean, Distributable, and Documented Data Science Code
- 2024-08-16: It's time to try out pixi!
- 2024-08-02: A survey of how to use protein language models for protein design: Part 2
- 2024-07-14: Conference report: SciPy 2024
- 2024-07-02: Use native formats when storing data
- 2024-06-18: Headache-free, portable, and reproducible handling of data access and versioning
- 2024-05-16: How to control PyMOL from Jupyter notebooks
- 2024-05-12: Paper Review: Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences
- 2024-05-05: Data Science in the Biotech Research Organization
- 2024-04-17: How LLMs can accelerate data science
- 2024-04-07: pyds-cli version 0.4.0 released!
- 2024-04-05: How to grow software development skills in a data science team
- 2024-03-23: How to organize and motivate a biotech data science team
- 2024-02-25: How to keep sharp with technical skills as a data science team lead
- 2024-02-18: Dashboard-ready data is often machine learning-ready data
- 2024-02-07: Success Factors for Data Science Teams in Biotech
- 2024-01-28: Exploratory data analysis isn’t open-ended
- 2023-12-12: Classes? Functions? Both?
- 2023-12-11: Elevating Team Performance: Feedback Strategies for Data Science Leaders
- 2023-10-14: How I made a local pre-commit hook to resize images
- 2023-10-07: How to choose a (conda) distribution of Python
- 2023-10-05: Shape Up and Data Science: A Match Closer to Agile Than You Think
- 2023-09-30: How automating git workflows improves data scientists
- 2023-09-23: How to automatically write git commit messages
- 2023-09-09: Article Review: 4 Skills the Next Generation of Data Scientists Needs to Develop
- 2023-09-06: Interviewing Data Science Candidates with Code Reviews
- 2023-08-28: Service vs. Product-Oriented Data Science
- 2023-08-04: Skating Ahead of the Puck: A Wayne Gretzky Approach to Tech Adoption
- 2023-07-24: Climbing the Tech Ladder or Staying Grounded? A Guide to Navigating Data Science Innovations
- 2023-05-13: How to Craft Stellar Pull Request Summaries with GPT-4
- 2023-02-05: Building a Translation App with GPT-3: The Story Behind My Creation
- 2022-09-16: Coding on an iPad with Codespaces and Blink
- 2022-04-02: Matrices and their connection to graphs
- 2022-04-01: Functional over object-oriented style for pipeline-esque code
- 2022-03-31: Everything gets a package? Yes, everything gets a package.
- 2021-11-28: What candidates can and cannot control in their job hunt
- 2021-09-30: Career FAQ
- 2021-09-12: Machine-Directed Evolution
- 2021-08-26: Hiring data scientists at Moderna! (2021)
- 2021-07-12: One killer way to burst to the cloud from your laptop
- 2021-07-10: How to enable custom source package installation in Binder
- 2021-05-14: Set environment variables inside a Jupyter notebook
- 2021-04-26: Publishing data with Datasette
- 2021-03-01: Machine learning system design
- 2021-01-27: Experience with M1 MacBook Air
- 2020-12-24: Moving my CI pipelines to GitHub Actions
- 2020-10-15: Fermi estimation and Bayesian priors
- 2020-10-06: Why giving talks is important for writing
- 2020-09-30: Tools to help you write consistent Python code
- 2020-09-12: Add a direct Binder link for built HTML notebooks
- 2020-09-07: Faster iteration over dataframes
- 2020-08-30: Pandera, Data Validation, and Statistics
- 2020-08-21: Software Engineering as a Research Practice
- 2020-07-26: Data/Software Challenges as Tools for Hiring
- 2020-07-11: Jupyter notebooks as scripts
- 2020-06-28: Statistical tests are just canned model comparisons
- 2020-06-15: What's the most optimal way to learn Bayesian statistics?
- 2020-06-14: P(A,B) to P(H,D)??
- 2020-06-02: Data Science Design Manual
- 2020-04-21: Use pyprojroot and Python’s pathlib to manage your data paths!
- 2020-04-12: Introducing a new essay on Markov models
- 2020-03-25: Resources for learning Python during COVID-19
- 2020-03-15: What can data scientists do during COVID-19?
- 2020-02-13: One Weird Trick to Speed Up Your TensorFlow Model 100X...
- 2020-01-18: Create your own auto-publishing slides with reveal-md and Travis CI
- 2020-01-16: PyData Ann Arbor Meetup: Testing for Data Science
- 2020-01-07: FastAPI: Flask-like generator of web APIs
- 2020-01-04: Build your digital profile as a data scientist
- 2020-01-02: On automating principled statistical analyses
- 2019-12-26: Serving multiple Panel apps together
- 2019-12-19: Simplifying Uncertainty Responsibly
- 2019-12-15: A Review of the Python Data Science Dashboarding Landscape in 2019
- 2019-11-09: Principled Git-based Workflow in Collaborative Data Science Projects
- 2019-10-31: Reimplementing and Testing Deep Learning Models
- 2019-10-30: Code review in data science
- 2019-10-29: "AI will not solve medicine"
- 2019-10-05: Jupyter Server with HTTPS on Personal Server
- 2019-09-07: Dokku: Building an internal Heroku at work
- 2019-07-26: PyViz Panel Apps
- 2019-07-23: T-distributed likelihoods are kind of neat
- 2019-07-15: SciPy 2019 Post-Conference
- 2019-07-07: Order of magnitude is more than accurate enough
- 2019-06-15: Graphs and Matrices
- 2019-05-29: Reasoning about Shapes and Probability Distributions
- 2019-05-10: Context Switching
- 2019-05-10: PyCon 2019 Tutorial and Conference Days
- 2019-04-29: PyCon 2019 Pre-Journey
- 2019-03-24: Variance Explained
- 2019-03-22: Functools Partial
- 2019-03-20: How I Work
- 2019-03-01: Pair Coding: Why and How for Data Scientists
- 2019-01-28: Minimum Viable Products (MVPs) Matter
- 2018-12-25: Conda hacks for data science efficiency
- 2018-12-16: Gaussian Process Notes
- 2018-12-09: Mathematical Intuition
- 2018-11-13: Solving Problems Actionably
- 2018-11-07: Bayesian Modelling is Hard Work!
- 2018-10-26: More Dask: Pre-Scattering Data
- 2018-10-11: Parallel Processing with Dask on GridEngine Clusters
- 2018-09-04: Optimizing Block Sparse Matrix Creation with Python
- 2018-08-07: Joint, conditional, and marginal probability distributions
- 2018-08-06: d-separation in causal inference
- 2018-08-01: nxviz 0.5 released!
- 2018-07-27: pyjanitor 0.3 released!
- 2018-07-16: Bayesian Estimation, Group Comparison, and Workflow
- 2018-07-14: ECDFs
- 2018-06-05: My Latent Dissatisfaction with Modern ML
- 2018-05-06: Model Baselines Are Important
- 2018-03-30: Consolidate your scripts using click
- 2018-02-28: Lessons learned and reinforced from writing my own deep learning package
- 2018-02-21: nxviz first PR merged!
- 2018-02-20: Deep Learning and the Importance of a Good Teacher
- 2018-02-13: Data scientists need to write good APIs
- 2018-02-07: Bayesian Inference & Testing Sets
- 2018-01-29: Refactor Notebook Code
- 2018-01-18: PyMC3 docs + Weibull patches merged!
- 2018-01-08: Bayesian Uncertainty: A More Nuanced View
- 2017-12-13: Visual Studio Code: A New Microsoft?
- 2017-11-03: Boston Bayesians Talk: An Attempt at Demystifying Bayesian Deep Learning
- 2017-10-31: Always Check Your Data
- 2017-10-27: Random Forests: A Good Default Model?
- 2017-10-22: Network Propagation
- 2017-10-07: A Data Scientist's Guide to Environment Variables
- 2017-08-31: What would be useful for aspiring data scientists to know?
- 2017-08-10: Next Steps
- 2017-08-02: Open Source Software
- 2017-07-22: Bayesian Neural Networks
- 2017-07-15: Insight Week 7
- 2017-07-08: Insight Week 6
- 2017-07-01: Insight Week 5
- 2017-06-30: Using Bokeh in FluForecaster
- 2017-06-24: Insight Week 4
- 2017-06-17: Insight Week 3
- 2017-06-10: Insight Week 2
- 2017-06-02: Insight Week 1
- 2017-05-13: Why I Teach Coding Tutorials
- 2017-04-21: Moving on from MIT
- 2017-02-08: Numba: My first attempt at being serious with it
- 2017-01-05: On Learning Math
- 2016-12-20: Female Doctors are Better than Male Doctors - For Real?
- 2016-10-23: Reinventing Statistical Language in Science
- 2016-10-09: Reproducible PI Manifesto
- 2016-09-27: Boxplot or Violin Plot?
- 2016-09-16: Why Uncertainty Matters
- 2016-08-16: The problem of too many splits?
- 2016-08-06: Variational Inference With PyMC3: A Lightweight Demo
- 2016-07-24: Sparse Matrix Multiplication in Python 3
- 2016-07-16: nxviz: A NetworkX Visualization Package
- 2016-07-16: Principles of Network Visualization
- 2016-06-22: In Defence of Extreme Openness
- 2016-06-10: Abstractions
- 2016-06-09: Network Science and Statistics: Applications and Fundamentals
- 2016-05-22: ODSC East
- 2016-03-13: R for Statistics, Python for Data Processing?
- 2016-02-09: scikit-learn tutorial
- 2015-12-03: Reticulate Evolution and Microbial Ecology
- 2015-09-28: Predicting HIV Drug Resistance Phenotype from Genotype
- 2015-09-03: In Which I Trained A Neural Network :)
- 2015-06-02: Thoughts on Open Data Science Conference
- 2015-05-30: How to do Testing as a Practice in Data Analysis
- 2015-04-03: Semantic Versioning for Papers: A Manifesto